Transformer-based comparative multi-view illegal transaction detection

In recent years, as the Ether platform has grown by leaps and bounds. Numerous unscrupulous individuals have used illegal transaction to defraud large sums of money, causing billions of dollars of losses to investors worldwide. Facing the endless stream of the illegal transaction based on Ether smart contracts problems, such as illegal transaction, money laundering, financial fraud, phishing. Currently, illegal transaction are only detected by a single view of the smart contract’s contract code view feature and account transaction view feature, which is not only incomplete, but also not fully representative of the smart contract’s features. More importantly, the single view detection model cannot accurately capture the global structure and semantic features between the Tokens of the view features. In this case, it is particularly important that all view features are shared among themselves. In this paper, we investigate a Transformer-based model for contrasting illegal transaction detection networks under multiple views (TranMulti-View Net). The model in this paper is based on Transformer to learn a multi-view fusion representation, which aims to maximise the fusion of the interaction information of different view features under the same condition. In this model we first use the Transformer model to learn global structure and semantic features from a sequence of Tokens tokenised by a view, capturing the remote dependencies of Tokens in the view features, and then we share the contract code view features and the account transaction view features across all views to learn important semantic information between views from each other. In addition, we find that the approach of semi-supervised training of multi-view features using contrast learning outperforms the scheme of prediction based on direct fusion of different view features, resulting in stronger correlation between view features. As a result, the underlying semantic information can be captured more accurately, leading to more accurate predictions of illegal transaction. The experimental results show that our proposed TranMulti-View Net obtains good detection results with a Precision score of 98%.


Introduction
The advent of Bitcoin has allowed commerce on the Internet to move away from reliance on trusted third-party institutions to an electronic payment system based on cryptographic proofs, thereby removing transactions from the inherent weaknesses of trust-based models. Bitcoin combines proven distributed technologies such as Timestamp Server and Proof-ofwork (PoW) to provide a trusted and authoritative third party for the transfer of value to participants. Blockchain, derived from Bitcoin, is a hot topic of research in industry and academia. The blockchain source application scenario has also been extended to healthcare [1][2][3], the Internet of Things [4] and the economy [5], and is therefore also known as the next generation of the Internet.
Ethernum is an open source blockchain-based distributed platform, which includes the transfer of Ether, the creation of smart contracts and the execution of smart contract code, is known as Blockchain 2.0. The platform was launched by Vitalik Buterin [6] in 2015 and has been popular among investors and the general public since its inception. While the Ethernum platform offers the advantages of trust, security, distribution and transparency, it also adds to the complexity of blockchain technology, making it difficult for investors to interpret the business logic of smart contracts on Ethernum, with only a small amount of descriptive information published by the developers on the smart contracts to understand the mechanics of the business. Due to the lack of regulation and anonymity of Ether, Some speculators take advantage of the above characteristics of Ethereum to carry out money laundering [7], financial fraud [8,9], phishing [10][11][12], Ponzi scheme [13,14] and other criminal activities on Ethereum.
A smart contract is a piece of code that enables a specific function to be performed. It is a protocol for the transfer of value between anonymous participants, which is automatically triggered when the pre-defined conditions of the protocol are met. Once executed, a smart contract is unchangeable and never expires, cannot be terminated manually and is not dependent on any third party. This has led many investors to believe that a continuously running smart contract project that continues to generate income is not subject to the risks of an illegal transaction. This is not the case, as many illegal transaction have been disguised under the guise of smart contracts. According to the latest data from blockchain analysis group Chainalysis [15], the hidden encryption fraud in blockchain transactions caused 4.3 billion dollars in losses in 2019, which is only based on the reported encryption fraud. Therefore, detecting illegal transaction on ethereum is therefore an urgent task.
The types of accounts owned in Etherum include externally owned accounts and contractual accounts. The behaviour of fraudsters is usually reflected in externally owned accounts and related transactions. Therefore, Bartoletti et al. [16,17] extracted appropriate features to detect Ponzi schemes based on historical transaction information, but this type of approach cannot identify smart contract early in their publication and requires a large amount of relevant account transaction information for model training. Smart contract scammers often use the auto-execution feature of smart contracts to publish convincing project plans to make investors believe by making them believe that they will receive bonuses in time and gain their trust. Therefore, Bian et al. [18] detect Ponzi schemes by analysing the opcodes, bytecodes and binary interfaces (ABIs) of smart contracts as basic features. This method uses a single channel for the detection of opcodes, bytecodes and binary interfaces (ABI) separately, lacking the interaction between the messages. The main problems in the current detection of illegal transaction in smart contracts are as follows [19]: 1. Information is lost. If the source code of an earlier input is very long the information will be lost in transit and the model will not learning approach to map the view features if we need a model that captures the long-range dependencies of view features, as we can fully understand the semantic contextual information.
2. Insufficient source code. Only about 25% of smart contracts on the Etherscan dataset have opcodes, bytecode, and ABI. In the absence of any of the opcodes, bytecodes and ABIs, the method directly affects the detection results.
3. The difficulty of obtaining data for smart contracts. Illegal transaction labeling is difficult on Etherscan data sets, while traditional deep learning models require large scale data to enable them to accurately identify faulty smart contracts.
In machine vision tasks, Hassani et al. [20] argue that humans can view the world through multiple sensory channels, for example seeing, hearing and feeling dogs, where each view can be noisy and incomplete, but important factors (such as physical, geometric and semantic) are often shared between all views. This approach is consistent with the induction theorem, which states that we can deepen our understanding of the semantic information of things by summarising multiple views of an observation. Inspired by this, in this paper we specifically investigate Ponzi scheme detection for multiple views. Our goal is to learn to capture shared information between multiple smart contract view channels and to discard specific inter-view channel distractors. To do this, we use a Transformer-based multi-view comparison learning approach. In this approach, we first use the Vision Transformer (ViT) [21], Transformer [22] and Bidirectional Encoder Representation from Transformer (BERT) [23] models to learn feature embeddings of contract code view features and account view features. The model learns the feature embedding of contract code view features account view features, obtains the global structure and semantic features of the view features, captures the remote dependencies between the view feature Token, and then uses a comparative view features are all in the same catagory (illegal transaction or legal transaction), feature embeddings are each mapped to adjacent points (Euclidean distance measure), and the view features in different categories are mapped to more distant points, Fig 1 shows an overview of our approach. Our main contributions are as follows: • A combination of ViT, BERT and Transformer models are used to detect illegal transaction.
In the ViT model we encode bytecode in the signature of the contract code view. We encode opcode view features in BERT model. Transformer encodes the account transaction view features. The ViT model, the BERT model and the Transformer model encode multi-view features that capture long-term dependencies between view feature Token tokens, deeepending the understanding of the semantic information of Ponzi schemes.
• Semi-supervised learning was used to train the model. This method mainly adopts the method of contrast learning, which projects the features of multiple views of the same category into the similar eigenvalue space. By using contrast learning, the model can extract better features of multiple view fusion and enhance the understanding of semantic information of the model.
• It can detect illegal account hidden in the contract code. Our model can extract features from the contract code view feature for prediction, and can detect illegal account in smart contracts even if one of the contract code view features and the account transaction view feature is intentionally hidden.
• We can detect illegal account before any financial loss is incurred at the outset of the contract. Our models have proven to be able to monitor illegal transaction before they occur, reduce user losses and create illegal transaction risk control platforms based on our models.
In Fig 1, we show examples of learning bytecode, opcode and account transaction view feature representations. The codes for each view can be concatenated to form a complete representation of the same class of features. It is worth noting that we use I 0 , T 0 and J 0 to denote the legal transaction smart contract view, where I 0f , T 0f and J 0f are feature representations of the bytecode view, opcode view and account transaction view features of the legal transaction contract after ViT, BERT and Transformer output respectively. I 0e , T 0e and J 0e are feature representations of the legal transaction view. I 1 , T 1 and J 1 denote the illegal transaction smart contract view. where I 1f , T 1f and J 1f are feature representations of the bytecode view, opcode view and account transaction view features of the illegal transaction contract as output by ViT, BERT and Transformer, respectively. I 1e , T 1e and J 1e and are feature representations of the illegal transaction view.

Related work
In this section, we first review illegal transaction detection methods based on account transaction characteristics. Then, we provide an overview of illegal transaction detection methods based on smart contract codes.

Account-based trading methods for illegal transaction detection
Some of the largest cryptocurrencies have publicly available blockschains, so most work [24,25] now extracts features from the transaction history of externally owned accounts to detect frauds. Blotraitet et al. [17] conducted a comprehensive investigation of Ponzi schemes using information such as the source code of smart contracts, users' gains and losses, and transaction history time to analyse Ponzi schemes and implications from various perspectives. Wu et al. [10] detected fraudulent information on Ethernet by mining account transaction information on the blockchain, first using trans2vec to extract transaction address features and then using support vector machines (SVM) to identify transaction address features into normal and abnormal smart contracts. The first step was token by Bartoletti et al. [16] by analysing the transaction data on the blockchain to determine the characteristics of 32 transaction histories. The results of Toyoda et al. [26] found that signed integers of patterns in a blockchain are assigned transactions, and that analysing the frequency of each pattern in a transaction can be a key feature for identifying Ponzi schemes. Although existing Ponzi scheme detection methods based on account transaction information have achieved some success in identifying Ponzi schemes, they require a large amount of account transaction data to train the model in order to fit the model for Ponzi scheme prediction and cannot predict Ponzi schemes before they have caused financial losses at the beginning of the contract.

A smart contract code based approach to illegal transaction detection
Lin Liu et al. [9] constructs a Heterogeneous Graph Transformer Networks (S_HGTNs) suitable for smart contract anomaly detection to detect financial fraud on the Ethereum platform. For feature representation, first extracts the features to construct a Heterogeneous Information Network (HIN) for smart contract, and uses the relationship matrix obtained from the metapath learned in the transformer network as the input of the convolution network, and finally uses the node embedding for classification tasks. Jiajing Wu et al. [10] propose a novel network embedding algorithm called trans2vec to extract the features of the addresses for subsequent phishing identification. Then, author adopt the one-class support vector machine (SVM) to classify the nodes into normal and phishing ones. Qi Yuan et al. [11] propose a three steps framework, first, the Ethereum transaction network is built based on the collected transaction records. Then, node2vec is used for subsequent phishing classification. Finally, a class of support vector machine (SVM) is used to distinguish whether an account is a phishing account. This model can propose a more targeted network embedding method. Weili Chen et al. [12] propose a graph-based cascade feature extraction method based on transaction records and a lightGBM-based Dual-sampling Ensemble algorithm to build the identification model. Shin Morishima [27] propose a structure of the subgraph is suitable for graphics processing units (GPUs) to accelerate detection for anomaly blockchain by using parallel processing.
To address the problems of account transaction-based illegal transaction detection methods, researchers [28,29] have used smart contract code-based Ponzi detection methods, which can detect Ponzi schemes before the smart contract code is first released. Chen et al. [30], for example, explore contract trading rules by analysing currency flow graphs, smart contract creation and smart contract invocation for transactions on Ethereum. Chen et al. [13] first obtained data of 200 intelligent Ponzi schemes by collecting open source smart contracts on Ethereum. Then the classification model of Ponzi scheme is established by using the transaction history of smart contract and the characteristic information of operation code. Torres et al. [31] classify Ponzi schemes by deploying smart contracts that contain hidden traps and lure users into honeypot contracts by analysing their opcodes and impact in Ethereum, which uses symbolic execution and explicit heuristics to classify honeypot contracts. Jung et al. [32] extracted Gini time and opcode features of transactions in smart contracts according to the compiled codes and transaction design functions in Ethereum website, and imported them into the classification model to detect Ponzi scheme behaviors in smart contracts. Bian et al. [18] propose an image-based Ponzi scheme detection method that converts the bytecode sequence, opcode frequency and application binary interface (ABI) into grayscale images as input to the detection model. Although based on the smart contract code Ponzi scheme detection method is able to predict Ponzi schemes before the smart contract code is released by analysing features such as the opcode of the smart contract code, the opcode, bytecode and ABI not only have different code lengths, but it is difficult to automatically learn the hidden rich semantic features from the long view features of the Token sequence to detect Ponzi schemes. In addition, the accuracy of the model is also significantly reduced when the opcode, bytecodes and ABI are hidden in Ethernet sites.

Methods
In this section, we describe in detail the proposed Transformer-based comparative multi-view illegal transaction detection network model (TranMulti-View Net). Specifically, we first introduce the pre-processing of account transaction view features, bytecode view and opcode view. Then, we describe the basic architecture of TranMulti-View Net. Finally, we introduce the relevant methods in TranMulti-View Net-Transfomrer and comparative learning.

Data processing
In order to build a valid illegal transaction model, we used the smart contract dataset provided by Farrugia et al. [33], which contains 2179 illegal transaction and 7662 legal transaction contracts. We divide the data set into training set and test set, among which training set accounts for 90% and test set accounts for 10%.
Account trading view characteristics. The account trading characteristics of an illegal transaction contain five features. As follows: • Long periods of account activity, high trading volumes and low contract balances. Because Ponzi schemes and financial fraud always try to maintain the image of a high rate of return.
• Unstoppable: Once deployed, smart contracts cannot be terminated by a third party/central authority • A very small percentage of accounts have returns greater than the number of investments. This is because of the originator of the contract will charge investors a high fee for the procedure of trading.
• The illegal transaction accounts have a short interval between the time of the first and last transaction. Therefore, each account in this dataset contains a total of 42 features, most of which are similar to those proposed by Chen [18] and Hirshman [19], such as known rate, balance, N-investment, N-payment, difference index, paid rate, and N-maxpay. maxpay, etc.
• Anonymity: The creators of illegal transaction accounts can themselves remain anonymous or hide contract codes.
Bytecode view. The publicly available smart contract code is available directly on Ethernet and can be compiled by a compiler to obtain bytecodes. The bytecode view of a smart contract is represented by a string of hexadecimal numbers. We convert the bytecodes into binary numbers in the order of the bytecodes and convert the binary numbers into pixels to produce a 224 × 224 grayscale map, reducing the time taken by the model to extract features of the bytecode view.
Opcode view. Based on the opcodes of smart contracts obtained on Ethernet, we build an English dictionary sequence for each different opcode, where the English sequences in the dictionary correspond to the opcodes one by one, with each word sequence enclosed by the Tokens [SOS] and [EOS].

Transformer-based comparison of multiple views under an illegal transaction detecting network models
As shown in Fig 2, the TranMulti-View Net-based illegal transaction detection. In TranMulti-View Net, the pre-processed opcode views, account transaction view features and bytecode views are passed to the opcode encoder, account transaction feature encoder and bytecode encoder respectively for encoding and feature mapping of multiple views. Then, using the comparative learning approach that we will elaborate on in Section 3.4; the same class of views are distributed with the same feature space and the fully-connected layer is plugged into the last layer of the model, and the cosine similarity of the view featuresis used to predict the illegal transaction using classiffication results.
Opcode encoder. In the detection we pass the opcode view into the opcode encoder, which we use the BERT model. In the BERT model we used a Transformer model with 8 attention headers and 12 layers of 512 width, of which we describe in detail in section of Transformer. In the Transformer model, the account transaction view features are normalised to establish long-range dependencies between feature graph Token and linearly projected into the multi-view embedding space. Token Embeddings uses a word embedding model, where

PLOS ONE
Transformer-based comparative multi-view illegal transaction detection Token Embeddings a sequence of dictionary-processed words that are leais learned as word embeddings. Positional Embeddings are used to represent the position information of words in a sentence, thus learning the text structure information of the operand view. In addition, we add a Mask self-attentiveness mechanism to the operand view in the operand encoder, which enhances the modelling capability of the model and aids BERT in extracting key features from the view.
Bytcode encoder. We pass the bytecode view into the bytecode encoder for linear mapping. In the bytecode encoder we use the ViT model, in which we use the same Transormer parameters as in the BERT model. The Transformer model accepts a 1D sequence of embedding inputs. To process the 2-dimensional bytecode image, we use Patch Embedding to map the image, x 2 R H×W×C flattened to a series of tokens token 2 R N�ðP 2 �CÞ , where (H, W) denotes the resolution of the original image, (P, P) denotes the resolution of each the resolution of the image Token, N ¼ HW P 2 , is the effective sequence length of the Transformer input in ViT. In addition, ViT also contains Position Embedding, which gives Patch plus relative location reduces the loss of location information in patch feature extraction.
The account transaction feature encoder. We also used a Transformer model to encode the account transaction feature information, so the Transformer parameters of the model are the same as those of the ViT Transformer. In addition, we did not include Position Embedding in the account encoder, as the input account transaction features only have 45 dimensions and we did not think it would be meaningful to include Position Embedding.
In Fig 2, the method uses the opcode encoder, bytecode encoder and account transaction feature encoder to jointly train feature extractors and linear classifiers for the bytecode view, opcode view and account transaction view features to predict illegal transaction in smart contracts. The encoders for the account transaction features are trained in a supervised way using comparative learning to train the bytecode encoders, the opcode encoder and the account transaction feature encoder to make full use of the shared information of the multi-view features (bytecode view, opcode view and account transaction view features).

Transformer
The long source code in the smart contracts leads to a loss of feature information during feature extraction, which prevented the model from fully understanding the semantic contextual information. The Transformer is able to process the input data in a parallel manner and effectively solve the long-time dependency problem, significantly reducing training and prediction time. The Transformer has demonstrated significant benefits in data processing in a number of areas such as rain removal [34], target recognition [35], etc. The Transformer model consists of a Position embedding module, a multi-headed attention mechanism and a feed-forward network. Multi-headed attention mechanism: The self-attentive mechanism plays a major role in the multi-headed principal mechanism. The self-attentive mechanism learns feature information through K, Q and V vectors, where K and V vectors record the information already learned, and the attention weights are obtained by querying Q. The output vector of the selfattentive mechanism is calculated as shown in Eq (1).
where K and Q are used to calculate the similarity between view features, then obtain the similarity weight matrix, and d k (k denotes dimensionality) represents the number of pairs of that the similarity weight matrix is scaled to control the inner product from being too large. Softmax functions are used to normalise the similarity weights. The normalised weights are weighted and summed with the corresponding V to obtain the attention output. The selfattentive mechanism can generate weights for different connections "dynamically", and thus handle long-range dependencies of view features. The multi-headed attention mechanism is a combination of h self-attentive mechanisms. Each self-attending module focuses on the same K, Q, V, each module corresponds to a subspace of the final output sequence. The output sequence are independent of each other, and the multi-headed attention mechanism module can focus on different information at different locations in the representation subspace at the same time. The multi-headed attention mechanism is calculated as shown in Eqs (2) and (3).
As shown in Eqs (2) and (3), h means there are h sets of K, Q, V vectors. W denotes the weight parameters for K, Q and V are different for each group W. concat means that the results of the self-attentive mechanism are concatenated. The concatenated weight are multiplied by a weight vector W to obtain the output of the multi-attentional force mechanism results.
Feedforward network: The feedforward network in Transformer consists of a Multilayer Perceptron (MLP) layer, a GRU [36] layer and a Relu activation function. The GRU makes it easier for the model to learn the long dependencies of view features.

Contrast learning
For multi-view features, we need to express the features of view features under the same category in the same space, so that the model can perform well in predicting illegal transaction in the absence of any one of the views. Our model incorporates comparative learning approach to learning multiple view features (account transaction view features, opcode view features and bytecode view features). It is possible to predict illegal transaction under the same category while ensuring that one or more views are available. Ensure consistency in the semantic information expressed, using contrast learning techniques to reduce differences in the semantic information expressed. By incorporating contrast learning into the training model, it is possible to distribute multi-view features in the same feature space. In this model, we have extracted the following view features from I for the bytecode view, T for the opcode view and J for the account transaction view features. View features are extracted as follows: where sim(u, v) = u T v/ k u k k v k, B represents the training batches and τ takes the value 0.5. When using the semi-supervised training method, in addition to using the loss function in contrast learning, we also fused the features of I f , T f and J f , and the fused features are fed into a full concatenation and a classification loss function is trained.
In the illegal transaction contracts data, the number of legal transaction contracts is much larger than the number of illegal transaction contracts, resulting in an unbalanced sample distribution during training, making the target detection loss value vulnerable to the loss value of the non-Ponzi contract sample. So we used Focal Loss [38] to reduce the impact of legal transaction contracts. The cross entropy function is setted weight so that the total loss function is controlled by Focal Loss. This is calculated as shown in Eq (12).
Focal loss p; y ð Þ ¼ À að1 À pÞ g log a p; y ¼ 1 À ð1 À aÞp g log a ð1 À pÞ; y ¼ 0 where α 2 [0, 1] denotes the number of legal transaction contracts when there are fewer. The larger the value of α, the larger the value of the loss contributed by the legal transaction contract will be. γ indicates that the regulatory factor reduce the weight of legal transaction and concentrates on training illegal transaction. p 2 [0, 1] represents the class probability of the predicted sample, y denotes the label for samples that are illegal transaction and legal transaction. If y = 1, it represents a Sample of an illegal transaction. if y = 0, then denotes the legal transaction sample. Ultimately, the specific formula for the loss function of the model in this paper is shown in (13).
It is worth noting that in Eq 13, If we use semi-supervised learning when η = 0.1 and γ = 0.1, then λ = 1. In Fig 3, we show using supervised and unsupervised training and TranMulti-View Net multi-view comparison learning to achieve accurate detection of the core pseudocode of an illegal transaction contract.

Experimental results and analysis
In this section, the results of our experiments are presented. First, we describe in detail our experimental parameter settings and the evaluation metrics of the experiments. Then, our model is compared with other state-of-the-art models for detecting illegal transaction. Finally, we perform a fading experiment to demonstrate the key conditions that affect the effectiveness of our model.

Experimental setup
Parameter settings. The Pytorch deep learning framework was used to build the algorithm model for this experiment. The graphics cards chosen for the experiments were two NVIDIA Tesla T4s and the processor was an Intel i7-8700 CPU. Before training, the image of the bytecode view is scaled to 216 × 216 and is enhanced with a series of random data, flipped, distorted and rotated. When building a dictionary sequence of opcodes, we build the sequence of opcodes to a length of 15,000 dimensions. When we train, we use transfer learning to speed up the training, the BERT and ViT models are first frozen and trained for 10 iterations, with an initial learning rate of 0.001, a decay rate of 0.95 times every two iterations, with a momentum of 0.9, and a adecay factor of 0.0005 and a batches of 12. Unfreeze after 10 iterations and continue training, with the initial learning rate set to 0.0001 and the same learning rate decay strategy with a batch size of 6. The training was eventually stopped at 30 epochs of loss convergence.
Metrics. In order to compare the performance of our model with other models in detecting and identifying illegal transaction, three metrics, namely Precision, Recall and F-score values, were used to measure the performance of the model. These three metrics are defined as follows. Recall where TP indicates that it is actually a positive sample and is discriminated as the number of a positive sample. FP denotes the number of samples that are actually negative but are discriminated as positive. FN denotes the number of samples that are actually positive but are discriminated as negative.

Experimental results
To demonstrate the superiority of our model over the current illegal transaction model, we compared TranMulti-View Net with a variety of detection models based on account transaction characteristics, smart contract code based detection models and smart contract code and account transaction characteristics based detection models, respectively. Table 1 shows the accuracy, recall and F-value of these methods. We can see from Table 1 that TranMulti-View Net performs best in all evaluation metrics, which demonstrates the superiority of our approach. Specifically, it achieves 98% accuracy, 97% recall, and 97% F-score. TranMulti-View Net achieves 4% higher than the previous highest accuracy, 3% higher than the previous highest recall, and 4% higher than the previous highest F-score when compared to detection method based on account transaction features. It is 6% higher than the previous highest accuracy, 8% higher than the previous highest recall, and 6% higher than the previous highest F-score when TranMulti-View Net is compared to detection method based on Smart Contract Code features. It is 2% higher than the previous highest accuracy, 4% higher than the previous highest recall, and 5% higher than the previous highest F-score when TranMulti-View Net is compared with

PLOS ONE
Transformer-based comparative multi-view illegal transaction detection detection method based on Opcodes and account transaction features. In addition, we found that the detection models based on account transaction characteristics performed poorly on almost all metrics, particularly in terms of recall and F-score. This may be related to the fact that several of the models based on account trading features use traditional machine learning. When the model is trained using the smart contract code features, the performance of the model improves relatively well, suggesting that smart contract code features are important in illegal transaction detection. However, they don't have the highest values because the smart contract code has too long contracts and they do not learn the global structures and semantic features from the Token sequences. In addition, the remote dependencies of Token in view features are not captured. When combined with smart contract code features and account transaction features, the accuracy of the model is further improved, illustrating the important role of multi-view fusion in detecting illegal transaction. However, the recall rate and F-score values did not reach the highest, because although the combination of the smart contract code features and account transaction features are fused, which are not tightly correlated with each other and the model does not capture the underlying semantic information well. In summary, TranMulti-View Net achieves the highest accuracy, recall and F-value compared to the appealing state-of-the-art model.

Ablation experiments
The effect of the different modules on the detection of the model. We analyse the effect of different modules in TranMulti-View Net in Table 2. As can be seen in rows 1, 2 and 3 of Table 2, the addition of the Transformer module to the TranMulti-View Net model's opcode encoder, account transaction feature encoder results in good recognition of the model, indicating that the Transformer enables the model to capture more remote dependencies between Tokens. The model is able to capture more remote dependencies between Tokens.
As can be seen in row 4 of Table 2, we fuse the account transaction view features, opcode view features and bytecode view features. The TranMulti-View Net model achieves 95% Precision, 94% Recall and 94% F-score respectively, which is a good improvement compared to single view detection. This indicates that the more views are provided, the better the TranMulti-View Net model is.
In row 5 of Table 2, due to the insufficient number of illegal transaction contracts, the classification errors is large, which affects the detection of our model. So we added Focal Loss to the TranMulti-View Net model. Then, we find the Recall and F-score values were improved by 1% compared to CrossEntropy Loss. This indicates that Focal Loss has a certain improvement on the data imbalance problem, but the improvement of Focal Loss is not obvious because we have added the method of fusing feature from multiple views, so that the view features can learn feature information from each other, which can alleviate the data imbalance problem to a certain extent.

PLOS ONE
Transformer-based comparative multi-view illegal transaction detection Finally, in order to use the semi-supervised training model, we add contrast learning to the Tranmulti-view Net model and found that our method achieves the highest accuracy in identifying illegal transaction, almost completely identifying them. This suggests that the effectiveness of the TranMulti-View Net model learning depends not only on learning common features by comparison across views, but also benefits from sharing feature space information between views. In summary, TranMulti-View Net achieves optimal recall and F-score benefits from improvements in several areas that may provide new ideas for illegal transaction in smart contracts.
Performance analysis of contrastive learning. In order to explore the effectiveness of contrast learning on multi-view features of model learning, we remove the full connection of the last layer of the model and use Kmeans algorithm to conduct clustering analysis on feature layers directly. As shown in Table 3, we find that the model with contrast loss added into the clustering of Kmeans algorithm can achieve higher Precision, indicating that contrast learning can make the distance between multi-view features of the same category closer, while the distance between multi-view features of different categories further.
Time consumption for different modules. To investigate the effectiveness of the experiment, we calculated the time consumption for the opcode view, bytecode view, account transaction view feature and multi-view detection in the TranMulti-View Net model separately. It can be seen from Fig 4 that the Tranmulti-View Net model takes slightly longer to detect illegal transactions than other single View detection methods, but its accuracy is greatly improved compared with other detection methods, so the increased time is still acceptable. The time consumed by the TranMulti-View Net model for detecting illegal transaction using multiple view is not much higher than the time consumed for detecting single views, further demonstrating the real-time nature of our model in detecting illegal transaction in contracts.
Feature interpretability ananlysis. In order to provide a clear analysis of the different view features extracted by the model, we used the TSNE [46] technique to perform dimensionality reduction of the features as well as feature visualisation analysis. Fig 5 shows the calculated smart contracts for illegal transaction and the legal transaction the scatter plot of the features of the smart contract for the scam. As can be seen in Fig 5, the view features of both illegal transaction and legal transaction contracts are regular, whether they are single-view features or multi-view features. Their view features show a regular pattern of feature spacing.
As shown in Fig 5(a), from the perspective of the distribution of bytecode view features, the features affecting the image classification results of illegal transaction contracts are mainly concentrated in the upper-right and lower-left regions, with a sporadic distribution on the upperright side as a whole.
As shown in Fig 5(b) and 5(c), there is a clear difference between the distribution of opcode view features and account transaction view features, which shows that there is a logical difference between the account transaction view features and opcode view features for the detection of illegal transaction contracts and legal transaction contracts. In addition, we found partial overlap in the distribution of features between illegally traded smart contracts and legally traded smart contracts in the account view feature distribution, suggesting that using only a single account view detection is not good enough to detect illegally traded smart contracts. Table 3. Represents the result of clustering by Kmeans after removing the full concatenation layer of the last layer in Tranmulti-View Net, and the best result is displayed in bold.

Model Precision
No Contrastive Learning 0.88

PLOS ONE
As shown in Fig 5(d), in contrast, we use a TranMulti-View Net model to Learning multiview feature distribution (contrastive learning is not used), with multi-view features distributed in the lower left and right regions. It shows that this model can make better decision by using multi-view feature.
As shown in Fig 5(e), combining the semi-supervised and contrast loss approaches, we find that the model learns almost the same distribution of multi-view features under the same category, and that the model learns more easily the difference between the distribution of illegal transaction smart contract features and legal transaction smart contract features.

Summary
In the area of smart contract-based illegal transaction detection, researchers have conducted research on detecting illegal transaction on ethereum from different perspectives, but most of the research has been based on account transaction view features or smart contract code for detection. While these methods have made a positive contribution to mitigating illegal activity on the blockchain network, they are all based on single-view detection methods. Such methods cannot accurately capture the global structure and semantic features of illegal transaction smart contract view features and require a large number of illegal transaction smart contracts for training.
Combining various ideas and methods used in multiple domains, we propose a Transformer-based comparative multi-view illegal transaction detection. The approach first uses the Transformer to capture the remote dependencies of Token in view features and automatically extracts global structural information and semantic features. Then, the model use semi-supervised learning and Contrast learning loss function approach, it is able to extract information about the interactions between view features from the maximised multi-view features of the dataset, making it possible to learn a robust feature. Extensive experiments have demonstrated that the approach proposed in this paper can better detect illegal transaction in smart contracts.
The drawbacks of our approach are, firstly, that our model has 200 million parameters, which makes it difficult to move our approach for many embedded devices. The number of embedded devices is crucial. Secondly, whether our method can be trained unsupervised using more advanced data augmentation techniques, it is also the focus of our next research. In our future work, we will continue to investigate the effectiveness of TranMulti-View Net for other tasks such as noise removal, target detection, image super-resolution, etc., so that it can be applied to a wider range of detection tasks.